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Abstract. We study random heteropolymer chain with gaussian distribution of kinds of monomers. 
The long-range correlations between kinds of monomers were introduce. The mean-field analysis of 
such heteropolymer indicates the existence of infinite energetic barrier between heteropolymer random 
coil and frozen states. Thus, the frozen state is kinetically unavailable for the random heteropolymer 
with power-law correlations in monomers' sequence. The relationship between our results and some 
early obtained results for the DNA intrones sequences are discusse. 
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The Lack of Long Range Correlations is a Necessary Condition 
for a Functional Biologically Active Protein. 

1. Introduction. 

The relationship between the sequence and conformation of a protein macromolecule is one of the 
great unsolved problems in biophysics. At the present time it is widely believed that functional 
proteins usually form a single compact three-dimensional structure that corresponds to the global 
energetic minimum in the conformational space. In recent years the first step to address this question 
is to study random heteropolymers and compare them with proteins. The fact that even chains with 
random sequences can have a unique ground state characterized by frozen path of polymer chain 
backbone was first examined in terms of random energy model (REM) [1,2]. The set of subsequent 
investigations was carried out on the basis of "microscopic" Hamiltonians in which the interactions 
between pairs of monomers were assumed to be random , independently taken from a Gaussian 
distribution [3] , or with polymer sequences explicitly present [4,5,6]. All these models were shown to 
exhibit freezing phase transition for random chain. 

Recently, Shakhnovich and Gutin[7] found that to have such a minimum it is sufficient that an amino 
acid sequence forms an uncorrelated random sequence. Accompanying these results rise the following 
question: Is the lack of long range correlations in protein sequences a necessary condition for a 
three-dimensional biologically functional structure formation? Here we give some positive reasons for 
this question. 

From this point of view is very interesting that at the recent years some results about long-range 
(scale-invariant) correlations in non-coding DNA sequences were obtain [8]. For example was find 
that only non-coding DNA sequences exhibits long-range correlations. Some reports [9,10] support 
this finding, but other authors [11,12] disagree. For example Voss[10] recently proposed that coding 
as well non-coding DNA sequences display long-range power-law correlations in their base pair 
sequences. 

In the present paper the above mentioned problem will examine for heteropolymer with the 
quenched random sequence described by the set of random variables G k characteristic of the each 
monomer . In the past [4-6] the monomer species were considered as independent random variables 



3 



4 



or as in [13] were examined short range ( with exponential decay ) correlations between kinds of 
monomers . We are investigate here the folding problem for the random heteropolymer with 
monomers quenched sequence in presence of long range ( power - law ) correlations . 



2. Model and Mean-Field Theory. 

Let us discuss the heteropolymer chain with a frozen sequence of monomers describing with 
Hamiltonian that is a function of the monomer coordinates {?j } . Our model Hamiltonian can be 
written 

»4E^-^AL,«-^ (21) 

where By = B Q + Bg t G j and C are virial coefficients describing two and three- particle 
interactions , G k is variable of specie of k- th monomer, a is the statistical segment length, T is 
the temperature. We work in units where k B = 1. 

Earlier [1-6] this problem in case of G k statistically independent variables was discuss. In the 
present paper we are using CT k as random variables with long range correlations , characterized by 
Gaussian distribution function in form : 



where K = 



P{a}oc eX p[-l/ 2(G,k- l G)] 
G = (G 1 ,..,G N ) 

is matrix describing correlations in chain sequence : 



(2.2) 



JC, = (a J o J )_ (2.3) 



p 



There are some reasons about existence of the scale-invariant long range correlations in the DNA 
sequences only in non-coding regions of DNA (intrones) (see, e.g. [14] ). The correlation function of 
the monomers' species in this case has form according to [14]: 

K(l) oc /P" 1 (2.4) 

where < fj < 1 and l = \i — j\ . 
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Now we are going to find the free energy F . The standard way to derive the partition function of a 
system with quenched disorder is to employ the replica trick: 



F=(F(a)) p =-T\i m — (Z(a) 



„->o dn 



(2.5) 



where {) p means average over all possible realizations of <7 . In these terms averaged value of 
partition function will come to : 

(Z") p = J Drrgift - ff) expC-C^X..* 5 ^ ~ F /W ~ ^ X 

— > 

where a is replica indices , r" - describes the position of i-th monomer of replica OC in three 

— * — > 

dimensional space and g (r" — r" ) is the Gaussian normalized probability distribution s.t. 
|<irg(r) = l . After linearization over the J G 
Z > „ value of distribution function , we will be led to : 



(2.6) 



x-r t 
v J 



and putting into expression for 



x 
V J 



)x 



where 



Thus 



Z " ) p - \ D r f g(r? - r- ) exp(-C£ a J J x # (x) - fi / £ J J x # 

J D y a (x) exp[l / 2^ a<p \dxdyWa (x)Yp (*.?)- ( 2 - 7 ) 

P«(x) = £.5(x-r) 

^(x,j) = £^^(x-r)5(x-r/) 

(Z") p oc J D Pa Dq a p exp[-F( Po( ,q a p)] (2.9) 



(2.8) 
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where F(p a , q a? ) = E(p a , q a(l ) - 5(p a , q a(5 ) is the free energy functional, £(p a , q a(3 ) is 
conformational energy , 5(p a , q a p) - entropy of n polymer chains which corresponded to polymer 
chains residues densities {p a } and two-replica overlap parameters {q a p} : 

- £(p a ,q ap ) = -C£jc£( Pa (*)) 3 - B £ J dx(p 2 a (x)) + In J DV|/ a (i) X 

i a 

X ex p|f Lap IL d ^¥a(^)¥ P (^9ap(^ , 9) ~ £ a J£ y d*d^ a (ir) Y p (#8(ir - y) 

(2.10) 

exp[5(p„,^)] = jDf^gif^ -ri a )S(p a -p a )S{q ap -q afS ) 
In the mean-field approximation we need to minimize the free energy functional F(p a , q a p) over the 
one- and two-replica order parameters p a , q a p . 

The expressions , which are obtained above are identical to them, obtained in [4] for the random 
sequence without correlations . The main difference is in two-replica overlap parameter q a p definition 

(see eqn(2.7)). 

Let us make Fourie-transformation of the order parameter of the system : 

where V indicates volume used by macromolecule . This transformation will led us to a new 
expression for conformational energy : 



(2.12) 



Using Gaussian properties of this integral expression for conformational energy can be rewrite as : 

-E(q) = const - 1 / 2 J^lndet P ap {k) 



k*0 

(2.13) 

-a Jk) 



Microscopic order parameter of the system can be displayed in the following form : 

U (x-y) = Q ap (x-y) + £ K^x - ^ )<5(y - r/ ) (2. 14) 
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where Q a p (x — y) is two-replica overlap parameter, which was used in some articles dedicated to 
random heteropolymers with non-correlated sequences [3,4 ]: 

^a-j>E/(i-rw-^) (2.i5) 

From the normalization condition 

\Q ajj (x-y)dx = p a (2.16) 
the order parameter Q a ^ (x — y) had been found as it was shown by Shakhnovich and Gutin [3] 



0„ fi (.?-v) = -|rM : V^) (2 ' l7) 



where R is the characteristic scale of two-replica overlap and 

jdz(p ap (z) = l (2.18) 
Order parameter of the system q a ^ , obviously satisfying to normalization conditions as followed : 

\dyq ap (x,y) = Kp a (x) 

Because the thermal average of quantity 5(x — t] a )8(y — rj) may be interpret as aprior probability 

of corresponding localization of a replica i- th residue and [3 replica j -th residue 

( Pr ob = Pg(x, y) ) , the order parameter q a p(l<) necessary to find in the following form 

q a ,(k) =(Kp/ fl> ap (^Z) + £ jstj K y lf( X ,y) (2.20) 

Taking into account the system translation invariance probability distribution P^(x, y) necessary to 
find as 

P^(x,y)= Constlft (y\x) (2.21) 
where P^(y\x) = Pr ob { [3 replica j -th residue situated in point y,if the a replica i- th 
residue situated in point x } is conditional probability distribution. 

It is known [15] that polymer chain behavior in globular state described by Gaussian statistic . 
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According to this by analogy with [3,4] and taking into account the normalization condition (eqn. 
2.19) we will use for q a ^ the following equation : 



q a ,(x -y) = KQ a ,(x -y) + £ WJ K^(x - y) 



if (x -y)=\dzP(z-x,\j-§^ cp ap (^) 



(2.22) 



where 



P(x-z, j-ih oc(a 2 j-ih- d/2 exp 



21 . J 

a 7 _ A 

V I I J 



(2.23) 



Here i^r, |j — ij) is the probability distribution of the end-to-end vector rfor the Gaussian polymer 
chain. After simplifying in the limit N — > °° we will come to a newer one equation for Fourie- 
transformation of the order parameter: 

u*> = 4 *m*q jj LI,. % - 4 -p{-- 2 ^ 2 l> - * = (2 24) 

= Kp9„p(*R) + 2p£ JC(J) exp(-a 2 A 2 7)(p ap (Ai?) ee P 9 ap (ki?)^(A) 

where A = |a| . It is obvious that without correlations in polymer chain sequence our results reduce to 
obtained recently in [4 ]. 

Using results of [16,4] it will lead us to the following form of conformational energy, in limit 
n : 

E c\dx 

- = -!/ 2 IJ — ln M*) (2-25) 

where 

X k (x) = 1 / |4 - pAW - f dyM k (^| -*M k (x) (2.26) 

and the Parisi function M k (x) parametrising the off-diagonal elements of the hierarchical matrix 
P„p(/^ in the n — > limit. 
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Now we have to minimize the free energy functional (see eqn . (2.8, 2.9)). It is known [3,4], 
that replica-symmetric solution is invalid for random heteropolymers , because in this case the entropy 
contribution has form[3] : 

S{p a , qap } = (n-1)AS(R) (2.27) 

where AS(R) is the loss of entropy of ideal polymer chain constrained in tube of diameter R . 
Here 

, v \- Na 2 lR 2 (R»a) 
AS(R) "{^l„(«/ a )(K« a) &28) 

Following the Parisi ansatz for one - step symmetry breaking , for n replicas , there are nj x 
groups with x replicas per group . The entropy loss is therefore : 

S{p a ,q aj3 }=^(x-1)AS(R) (2.29) 
In the same one step R.S.B. the energy contribution is come to the following form (for n — > ) 

E= E M = T/jdl\Un\b-xpMk/R)[l-Q[k)\\ + 

(2.30) 



(•-KM 



where b = 

For the subsequent minimization consider the above defined (see eqn. (2. 23)) function A(k) behavior 

A(klR) = K + 2£ ln K(l)ex V (-l^) (2.31) 
For the correlation function (2.4) the A(k I R) can be evaluated as 

\ for akl R«\ (2.33) 



( ak 

Mil R) = K U + 2r(p)^ 



Consequently, the function A(k I R) is increased monotonically with Rl k and thus for 
the any k value we can find the R scale s.t. the energy contribution in eqn(2.30) will be diverge as 

ln^ — COtlSt(R/ Cl) ^ j . From the form of eqns(2.30) we can see that there is a wide scale R region, 
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which is forbidden energetically, because, the dependence E vs. R have form presented in fig. la. The 
entropy loss AS(^?) due by restrictions with scale R has form presented in fig. lb. It is obvious that 

for the small enough values of R scale we have the situation identical to the [3,4] ( for the RHP with 
non-correlated sequence ). Consequently , free energy for the one-step R.S.B. have one maximal value 
at the R = (see fig. lc) and , correspondingly , one thermodynamically stable state. Moreover, the 
free energy for the one-step R.S.B. have the infinite energetic barrier due by above mentioned 
divergence, which separated the stable state with R « from the other (unfrozen) states. 
In the case of two-step R.S.B. we carried out the calculations analogous to eqn (2.30). For this scheme 
of replica symmetry braking the conformational energy is diverge at enough large replicas overlap 
scale too. Thus, the described above results does not due by one-step R.S.B. approximation and the 
examined here system properties are reflected. 

3.Discussion. 

Alongside with some peculiarities that were investigate there appear in the heteropolymer chain 
folding in presence of long-range correlations in residues' sequence. The inter-residual correlations 
defines by Gaussian distribution function (2.2) with non-diagonal correlation matrix (2.3). In 
difference of non-correlated sequence case this probability distribution doe not factored on the terms 
corresponded to the chain residues. In the present paper the standard techniques, developed in [3] 
were use. The results that were obtained above are mathematically similar to them in [4] , but two- 
replica order parameter redefinition led to unexpected physical properties of examined system. 
In the case of power-law correlations decay (see eqn. (2.4)) the off-diagonal terms which have 
contribution in the order parameter q a p (see eqn. (2. 8)) led to the qualitative different behavior of 

energetical term (2.30). Taking into account estimation (2.33) we can see that each harmonic k 
energetic contribution characterized by some space scale R of divergence, defined by following 
expression 

b-xpA(k/R)(l-^(k^ = (3.1) 
At the large enough value of R the last expression may be rewrite as 
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\2p 

(3.2) 

ak J 

Expressions (2.17,18) show that the characteristic scale of the (p(z) function decay is \z\ ~ land 
(pit) = <p(\z\) has form, schematically represented in fig. 2. Consequently, exists k , s.t. for any 
k > k* (p(k) < 1 . So, for example, for (p(r) oc exp(-r 2 / 2) we have (j)(k) oc exp(-£ 2 / 2) . 
Thus, for k > k* eqn.(3.2) have solution, defined by expression 



R = ak 



2px(l -$(*)) 



(3.3) 



and the system energy may be considered as superposition of different harmonics contributions, as 
represented in fig. 3. It is obvious that for the large enough values of R scale the system energy 
became diverge. Thus, for the polymer globule the frozen state (with R ~ ) is stable only. 
Taking into account the normalization condition for g(r) (see explanations to eqn. (2.6) ), the 

random coil state free energy in the polymer globule mean-field theory [15] evaluated as F = 0. 
Consequently, our system has two stable states. One, corresponded to the frozen chain backbone path 
and another, random coil state. Because in the large enough values of R scale the system energy 
became diverge these stable states are separate by infinite energetic barrier. This result may be 
interpreted as follows. The heteropolymer chain with long-correlated sequence can exist in folded 
R ~ or random coil state, but the folded state is kinetically do not available. Thus, the power-law 
correlations led to the random heteropolymer folding impossibility. 
It's interesting that the correlations exponential decay 

tf(Z)ocexp(-Z/$) (3-4) 

in the limit £ » 1 is equivalent to /5 = 1 in (2.4) which is the marginal case of maximal long-range 
correlations in sequence 

= P^(^){l + 2L, 1 ex p[- / («^ 2 +V£)]} = 
= p^(^){l + 2^ ie xp(-a 2 F/)} 

If we have then 
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q ap (k ) = py ap (kR){l + exp(- / / £)} ^ ^ 

which is completely equivalent to the order parameter obtained in [4] for the heteropolymer with non- 
correlated sequence. It's quite natural because in the case of £ ~ 1 the chain sequence may be divide 
into enough small pieces that will be statistically independent. 

The above obtained results are in agreement with hypothesis about long-range correlations in non- 
coding DNA sequences only [8]. Recently, Shakhnovich and Gutin[7] found that for RHP to have the 
energetically stable folded state it is sufficient that monomers' sequence forms an uncorrelated random 
sequence. Our results show that the lack of sequence long range correlations is a necessary condition 
for RHP folding possibility. 
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Figures captures 

Fig.la. Conformational energy dependence vs. scale of replicas overlap ( R ). Here V is the excluded 

volume of chain residue and R {) is the scale of conformational energy divergence. 

Fig.lb. Conformational entropy plotted vs. scale of replicas overlap (R ).Here V is the excluded 

volume of chain residue and R is the scale of conformational energy divergence. 

Fig.lc. Solid line is the free energy dependence vs. scale of replicas overlap ( R ). Dashed line is the 

free energy plotted vs. scale R for the RHP with non-correlated sequence of residues (see, e.g. [3,4] ). 

Fig.2. Overlap function behavior plotted vs. dimensionless scale of replicas overlap. 

Fig.3. The scheme of different harmonics contributions for the conformational energy. R k is the 

scale of divergence for the wave vector length k (see eqn(3.3)). 
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